Overview
Brought to you by YData
Dataset statistics
| Number of variables | 20 |
|---|---|
| Number of observations | 3475226 |
| Missing cells | 2700745 |
| Missing cells (%) | 3.9% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 530.3 MiB |
| Average record size in memory | 160.0 B |
Variable types
| Categorical | 4 |
|---|---|
| DateTime | 2 |
| Numeric | 13 |
| Boolean | 1 |
Airport_fee is highly overall correlated with tolls_amount | High correlation |
RatecodeID is highly overall correlated with improvement_surcharge | High correlation |
VendorID is highly overall correlated with improvement_surcharge | High correlation |
congestion_surcharge is highly overall correlated with improvement_surcharge | High correlation |
fare_amount is highly overall correlated with total_amount and 1 other fields | High correlation |
improvement_surcharge is highly overall correlated with RatecodeID and 2 other fields | High correlation |
tip_amount is highly overall correlated with total_amount | High correlation |
tolls_amount is highly overall correlated with Airport_fee | High correlation |
total_amount is highly overall correlated with fare_amount and 2 other fields | High correlation |
trip_distance is highly overall correlated with fare_amount and 1 other fields | High correlation |
VendorID is highly imbalanced (62.0%) | Imbalance |
store_and_fwd_flag is highly imbalanced (97.4%) | Imbalance |
improvement_surcharge is highly imbalanced (89.3%) | Imbalance |
congestion_surcharge is highly imbalanced (67.8%) | Imbalance |
passenger_count has 540149 (15.5%) missing values | Missing |
RatecodeID has 540149 (15.5%) missing values | Missing |
store_and_fwd_flag has 540149 (15.5%) missing values | Missing |
congestion_surcharge has 540149 (15.5%) missing values | Missing |
Airport_fee has 540149 (15.5%) missing values | Missing |
trip_distance is highly skewed (γ1 = 260.0643046) | Skewed |
fare_amount is highly skewed (γ1 = 1859.998402) | Skewed |
total_amount is highly skewed (γ1 = 1857.764441) | Skewed |
trip_distance has 90893 (2.6%) zeros | Zeros |
payment_type has 540149 (15.5%) zeros | Zeros |
extra has 1764424 (50.8%) zeros | Zeros |
mta_tax has 38170 (1.1%) zeros | Zeros |
tip_amount has 1118008 (32.2%) zeros | Zeros |
tolls_amount has 3259590 (93.8%) zeros | Zeros |
Airport_fee has 2706446 (77.9%) zeros | Zeros |
Reproduction
| Analysis started | 2025-06-04 06:50:28.072454 |
|---|---|
| Analysis finished | 2025-06-04 07:01:52.736556 |
| Duration | 11 minutes and 24.66 seconds |
| Software version | ydata-profiling vv4.16.1 |
| Download configuration | config.json |
Variables
VendorID
Categorical
High correlation  Imbalance 
| Distinct | 4 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 26.5 MiB |
| 2 | |
|---|---|
| 1 | |
| 7 | 1206 |
| 6 | 489 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 2 |
| 5th row | 2 |
Common Values
| Value | Count | Frequency (%) |
| 2 | 2719860 | |
| 1 | 753671 | 21.7% |
| 7 | 1206 | < 0.1% |
| 6 | 489 | < 0.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 2 | 2719860 | |
| 1 | 753671 | 21.7% |
| 7 | 1206 | < 0.1% |
| 6 | 489 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| 2 | 2719860 | |
| 1 | 753671 | 21.7% |
| 7 | 1206 | < 0.1% |
| 6 | 489 | < 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 3475226 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 2 | 2719860 | |
| 1 | 753671 | 21.7% |
| 7 | 1206 | < 0.1% |
| 6 | 489 | < 0.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 3475226 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 2 | 2719860 | |
| 1 | 753671 | 21.7% |
| 7 | 1206 | < 0.1% |
| 6 | 489 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 3475226 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 2 | 2719860 | |
| 1 | 753671 | 21.7% |
| 7 | 1206 | < 0.1% |
| 6 | 489 | < 0.1% |
| Distinct | 1672077 |
|---|---|
| Distinct (%) | 48.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 26.5 MiB |
| Minimum | 2024-12-31 20:47:55 |
|---|---|
| Maximum | 2025-02-01 00:00:44 |
| Invalid dates | 0 |
| Invalid dates (%) | 0.0% |
| Distinct | 1671993 |
|---|---|
| Distinct (%) | 48.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 26.5 MiB |
| Minimum | 2024-12-18 07:52:40 |
|---|---|
| Maximum | 2025-02-01 23:44:11 |
| Invalid dates | 0 |
| Invalid dates (%) | 0.0% |
passenger_count
Real number (ℝ)
Missing 
| Distinct | 10 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 540149 |
| Missing (%) | 15.5% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.297859 |
| Minimum | 0 |
|---|---|
| Maximum | 9 |
| Zeros | 24656 |
| Zeros (%) | 0.7% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 26.5 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 1 |
| Q3 | 1 |
| 95-th percentile | 3 |
| Maximum | 9 |
| Range | 9 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.75075028 |
|---|---|
| Coefficient of variation (CV) | 0.57845289 |
| Kurtosis | 11.314653 |
| Mean | 1.297859 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 3.0571207 |
| Sum | 3809316 |
| Variance | 0.56362598 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 2322434 | |
| 2 | 407761 | 11.7% |
| 3 | 91409 | 2.6% |
| 4 | 59009 | 1.7% |
| 0 | 24656 | 0.7% |
| 5 | 17786 | 0.5% |
| 6 | 12004 | 0.3% |
| 8 | 11 | < 0.1% |
| 7 | 4 | < 0.1% |
| 9 | 3 | < 0.1% |
| (Missing) | 540149 | 15.5% |
| Value | Count | Frequency (%) |
| 0 | 24656 | 0.7% |
| 1 | 2322434 | |
| 2 | 407761 | 11.7% |
| 3 | 91409 | 2.6% |
| 4 | 59009 | 1.7% |
| 5 | 17786 | 0.5% |
| 6 | 12004 | 0.3% |
| 7 | 4 | < 0.1% |
| 8 | 11 | < 0.1% |
| 9 | 3 | < 0.1% |
| Value | Count | Frequency (%) |
| 9 | 3 | < 0.1% |
| 8 | 11 | < 0.1% |
| 7 | 4 | < 0.1% |
| 6 | 12004 | 0.3% |
| 5 | 17786 | 0.5% |
| 4 | 59009 | 1.7% |
| 3 | 91409 | 2.6% |
| 2 | 407761 | 11.7% |
| 1 | 2322434 | |
| 0 | 24656 | 0.7% |
trip_distance
Real number (ℝ)
High correlation  Skewed  Zeros 
| Distinct | 4545 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5.8551262 |
| Minimum | 0 |
|---|---|
| Maximum | 276423.57 |
| Zeros | 90893 |
| Zeros (%) | 2.6% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 26.5 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.39 |
| Q1 | 0.98 |
| median | 1.67 |
| Q3 | 3.1 |
| 95-th percentile | 11.83 |
| Maximum | 276423.57 |
| Range | 276423.57 |
| Interquartile range (IQR) | 2.12 |
Descriptive statistics
| Standard deviation | 564.6016 |
|---|---|
| Coefficient of variation (CV) | 96.428596 |
| Kurtosis | 82905.777 |
| Mean | 5.8551262 |
| Median Absolute Deviation (MAD) | 0.86 |
| Skewness | 260.0643 |
| Sum | 20347887 |
| Variance | 318774.97 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 90893 | 2.6% |
| 0.9 | 43590 | 1.3% |
| 0.8 | 42866 | 1.2% |
| 1 | 42710 | 1.2% |
| 1.1 | 41354 | 1.2% |
| 1.2 | 39968 | 1.2% |
| 0.7 | 39967 | 1.2% |
| 1.3 | 37724 | 1.1% |
| 1.4 | 35433 | 1.0% |
| 0.6 | 35175 | 1.0% |
| Other values (4535) | 3025546 |
| Value | Count | Frequency (%) |
| 0 | 90893 | |
| 0.01 | 11118 | 0.3% |
| 0.02 | 3546 | 0.1% |
| 0.03 | 1603 | < 0.1% |
| 0.04 | 1252 | < 0.1% |
| 0.05 | 1106 | < 0.1% |
| 0.06 | 926 | < 0.1% |
| 0.07 | 820 | < 0.1% |
| 0.08 | 727 | < 0.1% |
| 0.09 | 630 | < 0.1% |
| Value | Count | Frequency (%) |
| 276423.57 | 1 | |
| 276099.95 | 1 | |
| 222167.49 | 1 | |
| 206137.99 | 1 | |
| 202771.63 | 1 | |
| 189687.43 | 1 | |
| 181139.99 | 1 | |
| 168079.57 | 1 | |
| 167452.94 | 1 | |
| 164959.95 | 1 |
RatecodeID
Real number (ℝ)
High correlation  Missing 
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 540149 |
| Missing (%) | 15.5% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.4825345 |
| Minimum | 1 |
|---|---|
| Maximum | 99 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 26.5 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 1 |
| Q3 | 1 |
| 95-th percentile | 2 |
| Maximum | 99 |
| Range | 98 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 11.632772 |
|---|---|
| Coefficient of variation (CV) | 4.685845 |
| Kurtosis | 64.754873 |
| Mean | 2.4825345 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 8.164267 |
| Sum | 7286430 |
| Variance | 135.32138 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 2756472 | |
| 2 | 94420 | 2.7% |
| 99 | 41963 | 1.2% |
| 5 | 26501 | 0.8% |
| 3 | 8622 | 0.2% |
| 4 | 7092 | 0.2% |
| 6 | 7 | < 0.1% |
| (Missing) | 540149 | 15.5% |
| Value | Count | Frequency (%) |
| 1 | 2756472 | |
| 2 | 94420 | 2.7% |
| 3 | 8622 | 0.2% |
| 4 | 7092 | 0.2% |
| 5 | 26501 | 0.8% |
| 6 | 7 | < 0.1% |
| 99 | 41963 | 1.2% |
| Value | Count | Frequency (%) |
| 99 | 41963 | 1.2% |
| 6 | 7 | < 0.1% |
| 5 | 26501 | 0.8% |
| 4 | 7092 | 0.2% |
| 3 | 8622 | 0.2% |
| 2 | 94420 | 2.7% |
| 1 | 2756472 |
store_and_fwd_flag
Boolean
Imbalance  Missing 
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 540149 |
| Missing (%) | 15.5% |
| Memory size | 6.6 MiB |
| False | |
|---|---|
| True | 7646 |
| (Missing) |
| Value | Count | Frequency (%) |
| False | 2927431 | |
| True | 7646 | 0.2% |
| (Missing) | 540149 | 15.5% |
PULocationID
Real number (ℝ)
| Distinct | 261 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 165.19158 |
| Minimum | 1 |
|---|---|
| Maximum | 265 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 26.5 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 48 |
| Q1 | 132 |
| median | 162 |
| Q3 | 234 |
| 95-th percentile | 249 |
| Maximum | 265 |
| Range | 264 |
| Interquartile range (IQR) | 102 |
Descriptive statistics
| Standard deviation | 64.529483 |
|---|---|
| Coefficient of variation (CV) | 0.39063422 |
| Kurtosis | -0.83359167 |
| Mean | 165.19158 |
| Median Absolute Deviation (MAD) | 62 |
| Skewness | -0.28915174 |
| Sum | 5.7407806 × 108 |
| Variance | 4164.0541 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 161 | 169977 | 4.9% |
| 237 | 163703 | 4.7% |
| 236 | 155647 | 4.5% |
| 132 | 146137 | 4.2% |
| 230 | 125829 | 3.6% |
| 186 | 119131 | 3.4% |
| 162 | 117930 | 3.4% |
| 142 | 110585 | 3.2% |
| 239 | 96614 | 2.8% |
| 163 | 95906 | 2.8% |
| Other values (251) | 2173767 |
| Value | Count | Frequency (%) |
| 1 | 377 | < 0.1% |
| 2 | 6 | < 0.1% |
| 3 | 175 | < 0.1% |
| 4 | 7482 | |
| 5 | 3 | < 0.1% |
| 6 | 87 | < 0.1% |
| 7 | 3192 | |
| 8 | 22 | < 0.1% |
| 9 | 117 | < 0.1% |
| 10 | 1329 | < 0.1% |
| Value | Count | Frequency (%) |
| 265 | 1380 | < 0.1% |
| 264 | 8141 | 0.2% |
| 263 | 67409 | |
| 262 | 49609 | |
| 261 | 16651 | 0.5% |
| 260 | 1476 | < 0.1% |
| 259 | 299 | < 0.1% |
| 258 | 708 | < 0.1% |
| 257 | 285 | < 0.1% |
| 256 | 2353 | 0.1% |
DOLocationID
Real number (ℝ)
| Distinct | 260 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 164.12518 |
| Minimum | 1 |
|---|---|
| Maximum | 265 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 26.5 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 43 |
| Q1 | 113 |
| median | 162 |
| Q3 | 234 |
| 95-th percentile | 257 |
| Maximum | 265 |
| Range | 264 |
| Interquartile range (IQR) | 121 |
Descriptive statistics
| Standard deviation | 69.401686 |
|---|---|
| Coefficient of variation (CV) | 0.42285826 |
| Kurtosis | -0.9349258 |
| Mean | 164.12518 |
| Median Absolute Deviation (MAD) | 68 |
| Skewness | -0.35704638 |
| Sum | 5.7037208 × 108 |
| Variance | 4816.5941 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 236 | 161376 | 4.6% |
| 237 | 149970 | 4.3% |
| 161 | 131258 | 3.8% |
| 230 | 108177 | 3.1% |
| 170 | 100060 | 2.9% |
| 142 | 98982 | 2.8% |
| 239 | 97559 | 2.8% |
| 162 | 93798 | 2.7% |
| 141 | 92675 | 2.7% |
| 68 | 89232 | 2.6% |
| Other values (250) | 2352139 |
| Value | Count | Frequency (%) |
| 1 | 6873 | |
| 2 | 4 | < 0.1% |
| 3 | 312 | < 0.1% |
| 4 | 15012 | |
| 5 | 8 | < 0.1% |
| 6 | 89 | < 0.1% |
| 7 | 7376 | |
| 8 | 37 | < 0.1% |
| 9 | 346 | < 0.1% |
| 10 | 3365 | 0.1% |
| Value | Count | Frequency (%) |
| 265 | 12086 | 0.3% |
| 264 | 11976 | 0.3% |
| 263 | 73889 | |
| 262 | 54149 | |
| 261 | 16521 | 0.5% |
| 260 | 2464 | 0.1% |
| 259 | 414 | < 0.1% |
| 258 | 1172 | < 0.1% |
| 257 | 1323 | < 0.1% |
| 256 | 6399 | 0.2% |
payment_type
Real number (ℝ)
Zeros 
| Distinct | 6 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.0366229 |
| Minimum | 0 |
|---|---|
| Maximum | 5 |
| Zeros | 540149 |
| Zeros (%) | 15.5% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 26.5 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1 |
| median | 1 |
| Q3 | 1 |
| 95-th percentile | 2 |
| Maximum | 5 |
| Range | 5 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.70133341 |
|---|---|
| Coefficient of variation (CV) | 0.67655594 |
| Kurtosis | 5.5771896 |
| Mean | 1.0366229 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 1.5995141 |
| Sum | 3602499 |
| Variance | 0.49186855 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 2444393 | |
| 0 | 540149 | 15.5% |
| 2 | 390429 | 11.2% |
| 4 | 76481 | 2.2% |
| 3 | 23773 | 0.7% |
| 5 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 0 | 540149 | 15.5% |
| 1 | 2444393 | |
| 2 | 390429 | 11.2% |
| 3 | 23773 | 0.7% |
| 4 | 76481 | 2.2% |
| 5 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 5 | 1 | < 0.1% |
| 4 | 76481 | 2.2% |
| 3 | 23773 | 0.7% |
| 2 | 390429 | 11.2% |
| 1 | 2444393 | |
| 0 | 540149 | 15.5% |
fare_amount
Real number (ℝ)
High correlation  Skewed 
| Distinct | 11538 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 17.081803 |
| Minimum | -900 |
|---|---|
| Maximum | 863372.12 |
| Zeros | 1398 |
| Zeros (%) | < 0.1% |
| Negative | 144118 |
| Negative (%) | 4.1% |
| Memory size | 26.5 MiB |
Quantile statistics
| Minimum | -900 |
|---|---|
| 5-th percentile | 3.7 |
| Q1 | 8.6 |
| median | 12.11 |
| Q3 | 19.5 |
| 95-th percentile | 52 |
| Maximum | 863372.12 |
| Range | 864272.12 |
| Interquartile range (IQR) | 10.9 |
Descriptive statistics
| Standard deviation | 463.47292 |
|---|---|
| Coefficient of variation (CV) | 27.132553 |
| Kurtosis | 3464796.3 |
| Mean | 17.081803 |
| Median Absolute Deviation (MAD) | 4.89 |
| Skewness | 1859.9984 |
| Sum | 59363125 |
| Variance | 214807.15 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 8.6 | 150277 | 4.3% |
| 7.9 | 150262 | 4.3% |
| 9.3 | 147090 | 4.2% |
| 7.2 | 144202 | 4.1% |
| 10 | 142048 | 4.1% |
| 10.7 | 133306 | 3.8% |
| 6.5 | 129846 | 3.7% |
| 11.4 | 125316 | 3.6% |
| 12.1 | 116814 | 3.4% |
| 12.8 | 106638 | 3.1% |
| Other values (11528) | 2129427 |
| Value | Count | Frequency (%) |
| -900 | 1 | < 0.1% |
| -850 | 1 | < 0.1% |
| -826.2 | 1 | < 0.1% |
| -700 | 5 | |
| -634.4 | 1 | < 0.1% |
| -600 | 2 | < 0.1% |
| -595.2 | 1 | < 0.1% |
| -579.8 | 1 | < 0.1% |
| -550 | 1 | < 0.1% |
| -541.3 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 863372.12 | 1 | < 0.1% |
| 2450.9 | 1 | < 0.1% |
| 1309.2 | 1 | < 0.1% |
| 950 | 3 | |
| 936.8 | 1 | < 0.1% |
| 900 | 2 | |
| 899.99 | 2 | |
| 893.75 | 1 | < 0.1% |
| 850 | 1 | < 0.1% |
| 826.2 | 1 | < 0.1% |
extra
Real number (ℝ)
Zeros 
| Distinct | 77 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.3177367 |
| Minimum | -7.5 |
|---|---|
| Maximum | 15 |
| Zeros | 1764424 |
| Zeros (%) | 50.8% |
| Negative | 29596 |
| Negative (%) | 0.9% |
| Memory size | 26.5 MiB |
Quantile statistics
| Minimum | -7.5 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 2.5 |
| 95-th percentile | 5 |
| Maximum | 15 |
| Range | 22.5 |
| Interquartile range (IQR) | 2.5 |
Descriptive statistics
| Standard deviation | 1.8615087 |
|---|---|
| Coefficient of variation (CV) | 1.412656 |
| Kurtosis | 2.8436517 |
| Mean | 1.3177367 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 1.4616594 |
| Sum | 4579432.8 |
| Variance | 3.4652146 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 1764424 | |
| 1 | 537935 | 15.5% |
| 2.5 | 532712 | 15.3% |
| 3.25 | 200468 | 5.8% |
| 5 | 109568 | 3.2% |
| 4.25 | 109078 | 3.1% |
| 5.75 | 80765 | 2.3% |
| 3.5 | 34386 | 1.0% |
| 6 | 22872 | 0.7% |
| 7.5 | 16872 | 0.5% |
| Other values (67) | 66146 | 1.9% |
| Value | Count | Frequency (%) |
| -7.5 | 460 | < 0.1% |
| -6 | 779 | < 0.1% |
| -5.75 | 2 | < 0.1% |
| -5.25 | 1 | < 0.1% |
| -5 | 2322 | 0.1% |
| -4.25 | 2 | < 0.1% |
| -3.25 | 7 | < 0.1% |
| -2.5 | 9982 | |
| -2 | 1 | < 0.1% |
| -1.75 | 4 | < 0.1% |
| Value | Count | Frequency (%) |
| 15 | 1 | < 0.1% |
| 14.25 | 3 | < 0.1% |
| 13.25 | 1 | < 0.1% |
| 12.5 | 1399 | < 0.1% |
| 11.75 | 736 | < 0.1% |
| 11.5 | 1 | < 0.1% |
| 11 | 1916 | 0.1% |
| 10.75 | 499 | < 0.1% |
| 10.25 | 1184 | < 0.1% |
| 10 | 6414 |
mta_tax
Real number (ℝ)
Zeros 
| Distinct | 9 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.47809905 |
| Minimum | -0.5 |
|---|---|
| Maximum | 10.5 |
| Zeros | 38170 |
| Zeros (%) | 1.1% |
| Negative | 57140 |
| Negative (%) | 1.6% |
| Memory size | 26.5 MiB |
Quantile statistics
| Minimum | -0.5 |
|---|---|
| 5-th percentile | 0.5 |
| Q1 | 0.5 |
| median | 0.5 |
| Q3 | 0.5 |
| 95-th percentile | 0.5 |
| Maximum | 10.5 |
| Range | 11 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.13746227 |
|---|---|
| Coefficient of variation (CV) | 0.28751838 |
| Kurtosis | 83.704664 |
| Mean | 0.47809905 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | -5.7552602 |
| Sum | 1661502.2 |
| Variance | 0.018895874 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0.5 | 3379839 | |
| -0.5 | 57140 | 1.6% |
| 0 | 38170 | 1.1% |
| 1 | 64 | < 0.1% |
| 10.5 | 5 | < 0.1% |
| 4.75 | 3 | < 0.1% |
| 4 | 2 | < 0.1% |
| 3.75 | 2 | < 0.1% |
| 6.5 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| -0.5 | 57140 | 1.6% |
| 0 | 38170 | 1.1% |
| 0.5 | 3379839 | |
| 1 | 64 | < 0.1% |
| 3.75 | 2 | < 0.1% |
| 4 | 2 | < 0.1% |
| 4.75 | 3 | < 0.1% |
| 6.5 | 1 | < 0.1% |
| 10.5 | 5 | < 0.1% |
| Value | Count | Frequency (%) |
| 10.5 | 5 | < 0.1% |
| 6.5 | 1 | < 0.1% |
| 4.75 | 3 | < 0.1% |
| 4 | 2 | < 0.1% |
| 3.75 | 2 | < 0.1% |
| 1 | 64 | < 0.1% |
| 0.5 | 3379839 | |
| 0 | 38170 | 1.1% |
| -0.5 | 57140 | 1.6% |
tip_amount
Real number (ℝ)
High correlation  Zeros 
| Distinct | 4197 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.9598128 |
| Minimum | -86 |
|---|---|
| Maximum | 400 |
| Zeros | 1118008 |
| Zeros (%) | 32.2% |
| Negative | 124 |
| Negative (%) | < 0.1% |
| Memory size | 26.5 MiB |
Quantile statistics
| Minimum | -86 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 2.45 |
| Q3 | 3.93 |
| 95-th percentile | 10 |
| Maximum | 400 |
| Range | 486 |
| Interquartile range (IQR) | 3.93 |
Descriptive statistics
| Standard deviation | 3.7796812 |
|---|---|
| Coefficient of variation (CV) | 1.2770001 |
| Kurtosis | 178.95533 |
| Mean | 2.9598128 |
| Median Absolute Deviation (MAD) | 2.25 |
| Skewness | 5.3446306 |
| Sum | 10286018 |
| Variance | 14.28599 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 1118008 | |
| 2 | 156750 | 4.5% |
| 1 | 120556 | 3.5% |
| 3 | 67562 | 1.9% |
| 5 | 39398 | 1.1% |
| 1.5 | 31038 | 0.9% |
| 4 | 29170 | 0.8% |
| 2.95 | 25146 | 0.7% |
| 2.8 | 22460 | 0.6% |
| 3.15 | 22112 | 0.6% |
| Other values (4187) | 1843026 |
| Value | Count | Frequency (%) |
| -86 | 1 | |
| -70 | 1 | |
| -52.45 | 1 | |
| -50.05 | 1 | |
| -33.66 | 1 | |
| -25 | 1 | |
| -19.61 | 1 | |
| -18.84 | 1 | |
| -17 | 1 | |
| -16.34 | 1 |
| Value | Count | Frequency (%) |
| 400 | 1 | |
| 360 | 1 | |
| 333.33 | 1 | |
| 333.3 | 1 | |
| 303 | 1 | |
| 285 | 1 | |
| 261 | 1 | |
| 228 | 1 | |
| 225.05 | 1 | |
| 220 | 1 |
tolls_amount
Real number (ℝ)
High correlation  Zeros 
| Distinct | 1234 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.4493081 |
| Minimum | -126.94 |
|---|---|
| Maximum | 170.94 |
| Zeros | 3259590 |
| Zeros (%) | 93.8% |
| Negative | 4559 |
| Negative (%) | 0.1% |
| Memory size | 26.5 MiB |
Quantile statistics
| Minimum | -126.94 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 6.94 |
| Maximum | 170.94 |
| Range | 297.88 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 2.0025818 |
|---|---|
| Coefficient of variation (CV) | 4.4570347 |
| Kurtosis | 88.950625 |
| Mean | 0.4493081 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 5.440691 |
| Sum | 1561447.2 |
| Variance | 4.0103339 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 3259590 | |
| 6.94 | 192386 | 5.5% |
| -6.94 | 3738 | 0.1% |
| 14.06 | 2249 | 0.1% |
| 3.18 | 1689 | < 0.1% |
| 16.06 | 1607 | < 0.1% |
| 13.88 | 1263 | < 0.1% |
| 13.38 | 676 | < 0.1% |
| 15.38 | 535 | < 0.1% |
| 11.19 | 520 | < 0.1% |
| Other values (1224) | 10973 | 0.3% |
| Value | Count | Frequency (%) |
| -126.94 | 1 | |
| -96.94 | 1 | |
| -82.69 | 1 | |
| -74.76 | 1 | |
| -48.28 | 1 | |
| -48.18 | 1 | |
| -47.04 | 1 | |
| -45.94 | 1 | |
| -44.94 | 1 | |
| -44.88 | 1 |
| Value | Count | Frequency (%) |
| 170.94 | 1 | |
| 126.94 | 1 | |
| 123 | 1 | |
| 105.88 | 1 | |
| 96.94 | 1 | |
| 95 | 1 | |
| 84 | 1 | |
| 82.69 | 1 | |
| 81 | 1 | |
| 80 | 2 |
improvement_surcharge
Categorical
High correlation  Imbalance 
| Distinct | 4 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 26.5 MiB |
| 1.0 | |
|---|---|
| -1.0 | 59530 |
| 0.0 | 37694 |
| 0.3 | 493 |
Length
| Max length | 4 |
|---|---|
| Median length | 3 |
| Mean length | 3.0171298 |
| Min length | 3 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1.0 |
|---|---|
| 2nd row | 1.0 |
| 3rd row | 1.0 |
| 4th row | 1.0 |
| 5th row | 1.0 |
Common Values
| Value | Count | Frequency (%) |
| 1.0 | 3377509 | |
| -1.0 | 59530 | 1.7% |
| 0.0 | 37694 | 1.1% |
| 0.3 | 493 | < 0.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 1.0 | 3437039 | |
| 0.0 | 37694 | 1.1% |
| 0.3 | 493 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 3512920 | |
| . | 3475226 | |
| 1 | 3437039 | |
| - | 59530 | 0.6% |
| 3 | 493 | < 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 6950452 | |
| Other Punctuation | 3475226 | |
| Dash Punctuation | 59530 | 0.6% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 3512920 | |
| 1 | 3437039 | |
| 3 | 493 | < 0.1% |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 3475226 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 59530 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 10485208 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 3512920 | |
| . | 3475226 | |
| 1 | 3437039 | |
| - | 59530 | 0.6% |
| 3 | 493 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 10485208 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 3512920 | |
| . | 3475226 | |
| 1 | 3437039 | |
| - | 59530 | 0.6% |
| 3 | 493 | < 0.1% |
total_amount
Real number (ℝ)
High correlation  Skewed 
| Distinct | 21995 |
|---|---|
| Distinct (%) | 0.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 25.611292 |
| Minimum | -901 |
|---|---|
| Maximum | 863380.37 |
| Zeros | 559 |
| Zeros (%) | < 0.1% |
| Negative | 63037 |
| Negative (%) | 1.8% |
| Memory size | 26.5 MiB |
Quantile statistics
| Minimum | -901 |
|---|---|
| 5-th percentile | 8.75 |
| Q1 | 15.2 |
| median | 19.95 |
| Q3 | 27.78 |
| 95-th percentile | 74 |
| Maximum | 863380.37 |
| Range | 864281.37 |
| Interquartile range (IQR) | 12.58 |
Descriptive statistics
| Standard deviation | 463.65848 |
|---|---|
| Coefficient of variation (CV) | 18.103674 |
| Kurtosis | 3459248.6 |
| Mean | 25.611292 |
| Median Absolute Deviation (MAD) | 5.61 |
| Skewness | 1857.7644 |
| Sum | 89005027 |
| Variance | 214979.18 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 17.7 | 23391 | 0.7% |
| 12.6 | 21030 | 0.6% |
| 21.9 | 18725 | 0.5% |
| 13.5 | 18309 | 0.5% |
| 16.86 | 18087 | 0.5% |
| 19.38 | 17651 | 0.5% |
| 16.02 | 17628 | 0.5% |
| 18.54 | 17469 | 0.5% |
| 21.06 | 17186 | 0.5% |
| 15.18 | 17089 | 0.5% |
| Other values (21985) | 3288661 |
| Value | Count | Frequency (%) |
| -901 | 1 | < 0.1% |
| -865.39 | 1 | < 0.1% |
| -851 | 1 | < 0.1% |
| -704.25 | 1 | < 0.1% |
| -701 | 3 | |
| -652.75 | 1 | < 0.1% |
| -633.21 | 1 | < 0.1% |
| -616.36 | 1 | < 0.1% |
| -607.75 | 1 | < 0.1% |
| -601 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 863380.37 | 1 | |
| 2506.71 | 1 | |
| 1311.7 | 1 | |
| 969.05 | 1 | |
| 953.5 | 1 | |
| 951 | 2 | |
| 903.5 | 1 | |
| 903.49 | 2 | |
| 901 | 1 | |
| 896.5 | 1 |
congestion_surcharge
Categorical
High correlation  Imbalance  Missing 
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 540149 |
| Missing (%) | 15.5% |
| Memory size | 26.5 MiB |
| 2.5 | |
|---|---|
| 0.0 | 225938 |
| -2.5 | 48321 |
Length
| Max length | 4 |
|---|---|
| Median length | 3 |
| Mean length | 3.0164633 |
| Min length | 3 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 2.5 |
|---|---|
| 2nd row | 2.5 |
| 3rd row | 2.5 |
| 4th row | 0.0 |
| 5th row | 0.0 |
Common Values
| Value | Count | Frequency (%) |
| 2.5 | 2660818 | |
| 0.0 | 225938 | 6.5% |
| -2.5 | 48321 | 1.4% |
| (Missing) | 540149 | 15.5% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 2.5 | 2709139 | |
| 0.0 | 225938 | 7.7% |
Most occurring characters
| Value | Count | Frequency (%) |
| . | 2935077 | |
| 2 | 2709139 | |
| 5 | 2709139 | |
| 0 | 451876 | 5.1% |
| - | 48321 | 0.5% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 5870154 | |
| Other Punctuation | 2935077 | |
| Dash Punctuation | 48321 | 0.5% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 2 | 2709139 | |
| 5 | 2709139 | |
| 0 | 451876 | 7.7% |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 2935077 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 48321 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 8853552 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| . | 2935077 | |
| 2 | 2709139 | |
| 5 | 2709139 | |
| 0 | 451876 | 5.1% |
| - | 48321 | 0.5% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 8853552 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| . | 2935077 | |
| 2 | 2709139 | |
| 5 | 2709139 | |
| 0 | 451876 | 5.1% |
| - | 48321 | 0.5% |
Airport_fee
Real number (ℝ)
High correlation  Missing  Zeros 
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 540149 |
| Missing (%) | 15.5% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.12391106 |
| Minimum | -1.75 |
|---|---|
| Maximum | 6.75 |
| Zeros | 2706446 |
| Zeros (%) | 77.9% |
| Negative | 10411 |
| Negative (%) | 0.3% |
| Memory size | 26.5 MiB |
Quantile statistics
| Minimum | -1.75 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 1.75 |
| Maximum | 6.75 |
| Range | 8.5 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.47250898 |
|---|---|
| Coefficient of variation (CV) | 3.8132914 |
| Kurtosis | 8.3496097 |
| Mean | 0.12391106 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 2.7957272 |
| Sum | 363688.5 |
| Variance | 0.22326474 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 2706446 | |
| 1.75 | 218203 | 6.3% |
| -1.75 | 10411 | 0.3% |
| 1.25 | 8 | < 0.1% |
| 5 | 7 | < 0.1% |
| 0.75 | 1 | < 0.1% |
| 6.75 | 1 | < 0.1% |
| (Missing) | 540149 | 15.5% |
| Value | Count | Frequency (%) |
| -1.75 | 10411 | 0.3% |
| 0 | 2706446 | |
| 0.75 | 1 | < 0.1% |
| 1.25 | 8 | < 0.1% |
| 1.75 | 218203 | 6.3% |
| 5 | 7 | < 0.1% |
| 6.75 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 6.75 | 1 | < 0.1% |
| 5 | 7 | < 0.1% |
| 1.75 | 218203 | 6.3% |
| 1.25 | 8 | < 0.1% |
| 0.75 | 1 | < 0.1% |
| 0 | 2706446 | |
| -1.75 | 10411 | 0.3% |
cbd_congestion_fee
Categorical
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 26.5 MiB |
| 0.75 | |
|---|---|
| 0.0 | |
| -0.75 | 6553 |
Length
| Max length | 5 |
|---|---|
| Median length | 4 |
| Mean length | 3.6502026 |
| Min length | 3 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0.0 |
|---|---|
| 2nd row | 0.0 |
| 3rd row | 0.0 |
| 4th row | 0.0 |
| 5th row | 0.0 |
Common Values
| Value | Count | Frequency (%) |
| 0.75 | 2246495 | |
| 0.0 | 1222178 | |
| -0.75 | 6553 | 0.2% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 0.75 | 2253048 | |
| 0.0 | 1222178 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 4697404 | |
| . | 3475226 | |
| 7 | 2253048 | |
| 5 | 2253048 | |
| - | 6553 | 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 9203500 | |
| Other Punctuation | 3475226 | 27.4% |
| Dash Punctuation | 6553 | 0.1% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 4697404 | |
| 7 | 2253048 | |
| 5 | 2253048 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 3475226 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 6553 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 12685279 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 4697404 | |
| . | 3475226 | |
| 7 | 2253048 | |
| 5 | 2253048 | |
| - | 6553 | 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 12685279 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 4697404 | |
| . | 3475226 | |
| 7 | 2253048 | |
| 5 | 2253048 | |
| - | 6553 | 0.1% |
Interactions
Correlations
| Airport_fee | DOLocationID | PULocationID | RatecodeID | VendorID | cbd_congestion_fee | congestion_surcharge | extra | fare_amount | improvement_surcharge | mta_tax | passenger_count | payment_type | store_and_fwd_flag | tip_amount | tolls_amount | total_amount | trip_distance | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Airport_fee | 1.000 | -0.043 | -0.222 | 0.300 | 0.086 | 0.119 | 0.303 | 0.190 | 0.418 | 0.295 | 0.063 | 0.031 | -0.004 | 0.004 | 0.230 | 0.534 | 0.419 | 0.382 |
| DOLocationID | -0.043 | 1.000 | 0.085 | -0.065 | 0.010 | 0.140 | 0.148 | 0.022 | -0.092 | 0.044 | 0.026 | -0.007 | 0.025 | 0.010 | 0.022 | -0.044 | -0.076 | -0.096 |
| PULocationID | -0.222 | 0.085 | 1.000 | -0.143 | 0.022 | 0.177 | 0.202 | -0.010 | -0.129 | 0.059 | 0.028 | -0.017 | 0.016 | 0.005 | -0.005 | -0.117 | -0.119 | -0.143 |
| RatecodeID | 0.300 | -0.065 | -0.143 | 1.000 | 0.221 | 0.158 | 0.415 | -0.128 | 0.348 | 0.922 | -0.274 | 0.047 | -0.000 | 0.002 | 0.050 | 0.454 | 0.320 | 0.266 |
| VendorID | 0.086 | 0.010 | 0.022 | 0.221 | 1.000 | 0.030 | 0.074 | 0.329 | 0.000 | 0.586 | 0.005 | 0.136 | 0.052 | 0.078 | 0.013 | 0.005 | 0.000 | 0.000 |
| cbd_congestion_fee | 0.119 | 0.140 | 0.177 | 0.158 | 0.030 | 1.000 | 0.376 | 0.206 | 0.000 | 0.277 | 0.001 | 0.020 | 0.172 | 0.003 | 0.021 | 0.024 | 0.000 | 0.000 |
| congestion_surcharge | 0.303 | 0.148 | 0.202 | 0.415 | 0.074 | 0.376 | 1.000 | 0.448 | 0.002 | 0.696 | 0.003 | 0.027 | 0.388 | 0.007 | 0.099 | 0.082 | 0.002 | 0.003 |
| extra | 0.190 | 0.022 | -0.010 | -0.128 | 0.329 | 0.206 | 0.448 | 1.000 | 0.068 | 0.403 | 0.164 | -0.058 | 0.226 | 0.049 | 0.301 | 0.142 | 0.196 | 0.058 |
| fare_amount | 0.418 | -0.092 | -0.129 | 0.348 | 0.000 | 0.000 | 0.002 | 0.068 | 1.000 | 0.000 | 0.096 | 0.046 | -0.080 | 0.000 | 0.358 | 0.386 | 0.956 | 0.790 |
| improvement_surcharge | 0.295 | 0.044 | 0.059 | 0.922 | 0.586 | 0.277 | 0.696 | 0.403 | 0.000 | 1.000 | 0.025 | 0.045 | 0.344 | 0.009 | 0.020 | 0.063 | 0.000 | 0.005 |
| mta_tax | 0.063 | 0.026 | 0.028 | -0.274 | 0.005 | 0.001 | 0.003 | 0.164 | 0.096 | 0.025 | 1.000 | -0.042 | -0.237 | 0.000 | 0.092 | -0.030 | 0.101 | 0.009 |
| passenger_count | 0.031 | -0.007 | -0.017 | 0.047 | 0.136 | 0.020 | 0.027 | -0.058 | 0.046 | 0.045 | -0.042 | 1.000 | 0.034 | 0.035 | 0.012 | 0.040 | 0.043 | 0.037 |
| payment_type | -0.004 | 0.025 | 0.016 | -0.000 | 0.052 | 0.172 | 0.388 | 0.226 | -0.080 | 0.344 | -0.237 | 0.034 | 1.000 | 0.008 | -0.016 | 0.022 | -0.058 | -0.070 |
| store_and_fwd_flag | 0.004 | 0.010 | 0.005 | 0.002 | 0.078 | 0.003 | 0.007 | 0.049 | 0.000 | 0.009 | 0.000 | 0.035 | 0.008 | 1.000 | 0.003 | 0.000 | 0.000 | 0.000 |
| tip_amount | 0.230 | 0.022 | -0.005 | 0.050 | 0.013 | 0.021 | 0.099 | 0.301 | 0.358 | 0.020 | 0.092 | 0.012 | -0.016 | 0.003 | 1.000 | 0.229 | 0.532 | 0.291 |
| tolls_amount | 0.534 | -0.044 | -0.117 | 0.454 | 0.005 | 0.024 | 0.082 | 0.142 | 0.386 | 0.063 | -0.030 | 0.040 | 0.022 | 0.000 | 0.229 | 1.000 | 0.397 | 0.365 |
| total_amount | 0.419 | -0.076 | -0.119 | 0.320 | 0.000 | 0.000 | 0.002 | 0.196 | 0.956 | 0.000 | 0.101 | 0.043 | -0.058 | 0.000 | 0.532 | 0.397 | 1.000 | 0.764 |
| trip_distance | 0.382 | -0.096 | -0.143 | 0.266 | 0.000 | 0.000 | 0.003 | 0.058 | 0.790 | 0.005 | 0.009 | 0.037 | -0.070 | 0.000 | 0.291 | 0.365 | 0.764 | 1.000 |
Missing values
Sample
| VendorID | tpep_pickup_datetime | tpep_dropoff_datetime | passenger_count | trip_distance | RatecodeID | store_and_fwd_flag | PULocationID | DOLocationID | payment_type | fare_amount | extra | mta_tax | tip_amount | tolls_amount | improvement_surcharge | total_amount | congestion_surcharge | Airport_fee | cbd_congestion_fee | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 2025-01-01 00:18:38 | 2025-01-01 00:26:59 | 1.0 | 1.60 | 1.0 | N | 229 | 237 | 1 | 10.0 | 3.5 | 0.5 | 3.00 | 0.0 | 1.0 | 18.00 | 2.5 | 0.0 | 0.0 |
| 1 | 1 | 2025-01-01 00:32:40 | 2025-01-01 00:35:13 | 1.0 | 0.50 | 1.0 | N | 236 | 237 | 1 | 5.1 | 3.5 | 0.5 | 2.02 | 0.0 | 1.0 | 12.12 | 2.5 | 0.0 | 0.0 |
| 2 | 1 | 2025-01-01 00:44:04 | 2025-01-01 00:46:01 | 1.0 | 0.60 | 1.0 | N | 141 | 141 | 1 | 5.1 | 3.5 | 0.5 | 2.00 | 0.0 | 1.0 | 12.10 | 2.5 | 0.0 | 0.0 |
| 3 | 2 | 2025-01-01 00:14:27 | 2025-01-01 00:20:01 | 3.0 | 0.52 | 1.0 | N | 244 | 244 | 2 | 7.2 | 1.0 | 0.5 | 0.00 | 0.0 | 1.0 | 9.70 | 0.0 | 0.0 | 0.0 |
| 4 | 2 | 2025-01-01 00:21:34 | 2025-01-01 00:25:06 | 3.0 | 0.66 | 1.0 | N | 244 | 116 | 2 | 5.8 | 1.0 | 0.5 | 0.00 | 0.0 | 1.0 | 8.30 | 0.0 | 0.0 | 0.0 |
| 5 | 2 | 2025-01-01 00:48:24 | 2025-01-01 01:08:26 | 2.0 | 2.63 | 1.0 | N | 239 | 68 | 2 | 19.1 | 1.0 | 0.5 | 0.00 | 0.0 | 1.0 | 24.10 | 2.5 | 0.0 | 0.0 |
| 6 | 1 | 2025-01-01 00:14:47 | 2025-01-01 00:16:15 | 0.0 | 0.40 | 1.0 | N | 170 | 170 | 1 | 4.4 | 3.5 | 0.5 | 2.35 | 0.0 | 1.0 | 11.75 | 2.5 | 0.0 | 0.0 |
| 7 | 1 | 2025-01-01 00:39:27 | 2025-01-01 00:51:51 | 0.0 | 1.60 | 1.0 | N | 234 | 148 | 1 | 12.1 | 3.5 | 0.5 | 2.00 | 0.0 | 1.0 | 19.10 | 2.5 | 0.0 | 0.0 |
| 8 | 1 | 2025-01-01 00:53:43 | 2025-01-01 01:13:23 | 0.0 | 2.80 | 1.0 | N | 148 | 170 | 1 | 19.1 | 3.5 | 0.5 | 3.00 | 0.0 | 1.0 | 27.10 | 2.5 | 0.0 | 0.0 |
| 9 | 2 | 2025-01-01 00:00:02 | 2025-01-01 00:09:36 | 1.0 | 1.71 | 1.0 | N | 237 | 262 | 2 | 11.4 | 1.0 | 0.5 | 0.00 | 0.0 | 1.0 | 16.40 | 2.5 | 0.0 | 0.0 |
| VendorID | tpep_pickup_datetime | tpep_dropoff_datetime | passenger_count | trip_distance | RatecodeID | store_and_fwd_flag | PULocationID | DOLocationID | payment_type | fare_amount | extra | mta_tax | tip_amount | tolls_amount | improvement_surcharge | total_amount | congestion_surcharge | Airport_fee | cbd_congestion_fee | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3475216 | 2 | 2025-01-31 23:58:20 | 2025-02-01 00:04:17 | NaN | 1.19 | NaN | NaN | 142 | 50 | 0 | 7.65 | 0.0 | 0.5 | 0.0 | 0.0 | 1.0 | 12.40 | NaN | NaN | 0.75 |
| 3475217 | 2 | 2025-01-31 23:38:25 | 2025-01-31 23:46:15 | NaN | 1.34 | NaN | NaN | 234 | 100 | 0 | -4.75 | 0.0 | 0.5 | 0.0 | 0.0 | 1.0 | 5.00 | NaN | NaN | 0.75 |
| 3475218 | 2 | 2025-01-31 23:26:03 | 2025-01-31 23:34:29 | NaN | 1.50 | NaN | NaN | 79 | 90 | 0 | 9.95 | 0.0 | 0.5 | 0.0 | 0.0 | 1.0 | 14.70 | NaN | NaN | 0.75 |
| 3475219 | 2 | 2025-01-31 23:21:00 | 2025-01-31 23:36:00 | NaN | 2.12 | NaN | NaN | 224 | 144 | 0 | 15.15 | 0.0 | 0.5 | 0.0 | 0.0 | 1.0 | 19.90 | NaN | NaN | 0.75 |
| 3475220 | 2 | 2025-01-31 23:26:31 | 2025-01-31 23:40:04 | NaN | 1.85 | NaN | NaN | 90 | 144 | 0 | 13.40 | 0.0 | 0.5 | 0.0 | 0.0 | 1.0 | 18.15 | NaN | NaN | 0.75 |
| 3475221 | 2 | 2025-01-31 23:01:48 | 2025-01-31 23:16:29 | NaN | 3.35 | NaN | NaN | 79 | 237 | 0 | 15.85 | 0.0 | 0.5 | 0.0 | 0.0 | 1.0 | 20.60 | NaN | NaN | 0.75 |
| 3475222 | 2 | 2025-01-31 23:50:29 | 2025-02-01 00:17:27 | NaN | 8.73 | NaN | NaN | 161 | 116 | 0 | 28.14 | 0.0 | 0.5 | 0.0 | 0.0 | 1.0 | 32.89 | NaN | NaN | 0.75 |
| 3475223 | 2 | 2025-01-31 23:26:59 | 2025-01-31 23:43:01 | NaN | 2.64 | NaN | NaN | 144 | 246 | 0 | 14.91 | 0.0 | 0.5 | 0.0 | 0.0 | 1.0 | 19.66 | NaN | NaN | 0.75 |
| 3475224 | 2 | 2025-01-31 23:14:34 | 2025-01-31 23:34:52 | NaN | 3.16 | NaN | NaN | 142 | 107 | 0 | 17.55 | 0.0 | 0.5 | 0.0 | 0.0 | 1.0 | 22.30 | NaN | NaN | 0.75 |
| 3475225 | 2 | 2025-01-31 23:56:42 | 2025-02-01 00:07:27 | NaN | 2.29 | NaN | NaN | 237 | 238 | 0 | 12.09 | 0.0 | 0.5 | 0.0 | 0.0 | 1.0 | 16.09 | NaN | NaN | 0.00 |